Empirical Study of Utilizing Morph-Syntactic Information in SMT
نویسندگان
چکیده
In this paper, we present an empirical study that utilizes morph-syntactical information to improve translation quality. With three kinds of language pairs matched according to morph-syntactical similarity or difference, we investigate the effects of various morpho-syntactical information, such as base form, part-of-speech, and the relative positional information of a word in a statistical machine translation framework. We learn not only translation models but also word-based/class-based language models by manipulating morphological and relative positional information. And we integrate the models into a log-linear model. Experiments on multilingual translations showed that such morphological information as part-of-speech and base form are effective for improving performance in morphologically rich language pairs and that the relative positional features in a word group are useful for reordering the local word orders. Moreover, the use of a class-based n-gram language model improves performance by alleviating the data sparseness problem in a word-based language model.
منابع مشابه
پارس مورف: تحلیلگر صرفی زبان فارسی
In this paper, the theoretical foundation, the way of implementation and the uses of Pars Morph, a Persian morphological analyzer is introduced. Pars Morph is a rule-based Persian morphological analysis system, which analyzes the internal structure of word in Persian and determines the grammatical category and function of the word parts. Pars Morph being in link with a lexicon covering about 45...
متن کاملA Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملImprovement of the Results of Statistical Machine Translation System using Anusaaraka
This paper describes an efficient experimental approach for the improvement of translation quality of phrase based statistical machine translation system by utilizing the insights of the rule based machine translation. As the most primitive step it is believed that appending large and accurately designed linguistic resources such as multiword bilingual dictionaries to the existing training corp...
متن کاملPractical Approach to Syntax-based Statistical Machine Translation
This paper presents a practical approach to statistical machine translation (SMT) based on syntactic transfer. Conventionally, phrase-based SMT generates an output sentence by combining phrase (multiword sequence) translation and phrase reordering without syntax. On the other hand, SMT based on tree-to-tree mapping, which involves syntactic information, is theoretical, so its features remain un...
متن کاملCan Semantic Role Labeling Improve SMT?
We present a series of empirical studies aimed at illuminating more precisely the likely contribution of semantic roles in improving statistical machine translation accuracy. The experiments reported study several aspects key to success: (1) the frequencies of types of SMT errors where semantic parsing and role labeling could help, and (2) if and where semantic roles offer more accurate guidanc...
متن کامل